A novel approach in continuous speech recognition for Vietnamese, an isolating tonal language
نویسندگان
چکیده
This paper proposes a new approach for the integration of the Vietnamese language characteristics into a Large Vocabulary Continuous Speech Recognition System (LVCSR) which was built for some European languages. Firstly, a new module of tone recognition using Hidden Markov model was constructed. Secondly, several methods were applied to transform a text corpus of monosyllabic words into text corpus of polysyllabic words and a statistical language model of polysyllabic words was built by using the new text corpus. Finally, all the knowledge has been included in the LVCSR system so that this system can be adapted for Vietnamese. Experiments are made on the VNSPEECHCORPUS. The results show that the accuracy of Vietnamese recognition system was increased, 46% of relative reduction of the word error rate is obtained by using Vietnamese language characteristics.
منابع مشابه
Large vocabulary continuous speech recognition for Vietnamese, an under-resourced language
This paper proposes a method to build a Vietnamese Large Vocabulary Continuous Speech Recognition system (Vietnamese LVCSR system). The difference between Vietnamese and European languages is analyzed and used to adapt a LVCSR system for European languages to Vietnamese. Experiments are implemented on the VNSPEECHCORPUS. The results show that the accuracy of Vietnamese recognition system is inc...
متن کاملLarge Vocabulary Continuous Speech Recognition for Vietnamese, a Under-resourced Language
This paper proposes a method to build a Vietnamese Large Vocabulary Continuous Speech Recognition system (Vietnamese LVCSR system). The difference between Vietnamese and European languages is analyzed and used to adapt a LVCSR system for European languages to Vietnamese. Experiments are implemented on the VNSPEECHCORPUS. The results show that the accuracy of Vietnamese recognition system is inc...
متن کاملHMM-based TTS for hanoi vietnamese: issues in design and evaluation
This paper presents the development and evaluation of an HMM-based TTS system for the modern Hanoi dialect of Northern Vietnamese, a tonal language. A study of specific phonetic and prosodic features of Hanoi Vietnamese is discussed. Consequences on the design of an HMM-based TTS system are derived. Using this knowledge, a TTS system, called VTed, is then developed under the Mary TTS platform. ...
متن کاملFirst steps in building a large vocabulary continuous speech recognition system for Vietnamese
This paper presents an overview of our activities for building a Large Vocabulary Continuous Speech Recognition (LVCSR) system for Vietnamese implemented at CLIPS-IMAG Laboratory (France) and International Research Center MICA (Vietnam). Firstly, a new methodology for fast text corpora acquisition for minority languages which has been applied to Vietnamese is proposed. Secondly, the first resul...
متن کاملImproved Bayesian Training for Context-Dependent Modeling in Continuous Persian Speech Recognition
Context-dependent modeling is a widely used technique for better phone modeling in continuous speech recognition. While different types of context-dependent models have been used, triphones have been known as the most effective ones. In this paper, a Maximum a Posteriori (MAP) estimation approach has been used to estimate the parameters of the untied triphone model set used in data-driven clust...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008